feat(tools): add backward graph generation and validation tools by Dayuxiaoshui · Pull Request #711 · PaddlePaddle/GraphNet

Dayuxiaoshui · 2026-05-15T09:41:42Z

PR Overview

This PR fixes 4 critical issues in the backward_graph_extractor.py pipeline for generating backward computational graphs, adds a kernel_dedup.py tool for Triton kernel-level deduplication, and improves test_compiler.py compatibility with list-typed outputs from backward graphs.

Types of Samples Fixed

Issue 1: BatchNorm subgraphs crashing during backward graph generation

Affected samples: ultralytics/yolov6l_start2_end8_0, ultralytics/yolov9e-seg, and others containing BatchNorm layers.

Root cause: The original implementation uses module.train() mode, causing BatchNorm's running_mean/running_var to have requires_grad=True when passed to aot_module_simplified. However, _native_batch_norm_legit_no_training does not support gradient computation w.r.t. running_mean:

RuntimeError: not differentiable with respect to argument 'running_mean'

Fix: Switch to module.eval() mode and parse weight_meta.py original_name to identify running_mean/running_var/num_batches_tracked, excluding them from requires_grad.

Issue 2: Input tensors corrupted by inplace operations

Affected samples: All backward graph generation.

Root cause: The original code reuses raw input tensors directly. Inplace ops (e.g., add_) mutate leaf tensors, causing gradient computation anomalies.

Fix: Apply detach().clone() to all input tensors.

Issue 3: Backward graph list outputs unsupported by test_compiler

Affected samples: Backward graphs returning [tensor], e.g., mmpose/LiteHRNet-18_start2_end6_0.

Root cause: Backward graphs output [tensor] lists. test_compiler's _align_output_device and torch.equal comparison functions only handle Tensor, crashing on list types:

TypeError: equal(): argument 'input' (position 1) must be Tensor, not list

Fix: Add recursive handling of nested list/tuple structures in test_compiler's output alignment and comparison functions.

Issue 4: Missing graph_hash.txt prevents kernel extraction

Affected: All backward graph samples — 0 kernels extracted after successful compilation.

Root cause: GraphExtractor does not generate graph_hash.txt when saving models. triton_kernel_extractor requires original_graph/graph_hash.txt to trigger extraction.

Fix: GraphExtractor now computes SHA256 of model.py and writes graph_hash.txt automatically.

Success Rate: Before vs. After

Before this PR (original backward_graph_extractor):

Subgraphs with BatchNorm: 0% (all crash with running_mean gradient error)
Typical subgraphs without BN: ~60-70% (affected by inplace inputs and train mode)

After this PR:

Subgraph Type	Samples	Success	Failed	Success Rate
typical float32	30	30	0	100%
typical float32	50	50	0	100%
fusible float32	30	28	2	93.3%
fusible float32	50	42	8	84%
typical backward compile + extract	20	15	5	75%

87.5% of fusible failures are due to output tensors without requires_grad (e.g., int64 indices/masks), a structural characteristic of fusible decomposition, not a code bug.

test_compiler Verification

Before: test_compiler crashes on backward graphs (missing weight_meta + list output). After:

Subgraph Type	Generated	test_compiler Passed	test_compiler Failed
typical float32	30	30 (100%)	0
fusible float32	28	28 (100%)	0

Zero false positives: No "Environment fluctuation detected" events.

Triton Kernel Dedup Tool

New tools/triton_kernel_extractor/kernel_dedup.py (invoked via dedup subcommand). Performs kernel-level dedup by hashing Triton source code content, complementary to graph-level graph_hash.txt dedup:

Graph Type	Samples	Total Kernels	Unique	Dedup Rate
Forward typical	24	24	21	12.5%
Backward typical	15	15	15	0%

Changed Files

File	Change
`graph_net/torch/sample_pass/backward_graph_extractor.py`	`module.eval()` + BN param filtering + `detach().clone()`
`graph_net/torch/extractor.py`	Auto-generate `graph_hash.txt` on save
`graph_net_bench/torch/test_compiler.py`	Support nested list/tuple outputs
`tools/triton_kernel_extractor/kernel_dedup.py`	New: Triton kernel source content dedup tool
`tools/triton_kernel_extractor/__main__.py`	New `dedup` subcommand

paddle-bot · 2026-05-15T09:42:33Z

Thanks for your contribution!

This commit introduces backward graph generation pipeline integrated with GraphNet's test_compiler framework. Changes: - graph_net/torch/extractor.py: add try/except for capture_sparse_compute to support PyTorch versions where the config does not exist. - graph_net/torch/sample_pass/backward_graph_extractor.py: - switch module from train() to eval() to avoid dropout/BN side effects - clone forward inputs with detach().clone() to avoid inplace modification - add _is_pure_shape_graph() to skip subgraphs with only shape ops - tools/backward_graph_test.py: - batch backward FX Graph generation via aot_autograd - integrated test_compiler validation with auto-generated weight_meta.py - default GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD=0.5 and trials=10 - tools/backward_kernel_dedup.py: - Triton kernel dedup analysis for backward graphs

Xreki

这个PR修复了哪些类型样本的反向图生成问题，需要举例在PR描述里面说明。应用PR后反向图生成成功率变化的数据，也需要写到PR描述里面。

Xreki · 2026-05-18T02:26:23Z

            self.model_path, use_dummy_inputs=False, device=self.device
        )
-        module.train()
+        module.eval()


eval模式下不会生成反向图吧？

model.eval() 不会禁用梯度计算，只有 torch.no_grad() / torch.inference_mode() 才会。eval 仅改变特定层的前向行为（dropout → identity，BatchNorm → 用 running stats 而非 batch stats），反向传播完全正常。而且使用 eval 模式反而更好

Xreki · 2026-05-18T02:26:47Z

            self.model_path, use_dummy_inputs=False, device=self.device
        )
-        module.train()
+        module.eval()


eval模式下不会有反向图吧？

Xreki · 2026-05-18T02:27:38Z

-        module.train()
+        module.eval()
+
+        if self._is_pure_shape_graph(module):


这种列举不完的，不建议加这种判断

同意，已删除。纯形状子图（只有 view/reshape/transpose 等）在执行 backward 捕获时会自然地因为输出 tensor 无可求导而返回空，不需要额外预处理跳过。

Xreki · 2026-05-18T02:30:37Z

@@ -0,0 +1,538 @@
+#!/usr/bin/env python3


直接用https://github.com/PaddlePaddle/GraphNet/blob/develop/graph_net/test/backward_graph_extractor.sh 这个脚本就可以测试，不需要再加单测。

已删除 tools/backward_graph_test.py。

Xreki · 2026-05-18T02:33:02Z

+
+
+def main():
+    parser = argparse.ArgumentParser(description="Backward kernel dedup analysis.")


这个代码是什么反向Kernel去重？按照model.py的graph_hash.txt去重吗？这也不需要额外写代码，使用已有代码即可实现。

…one tools - Remove _is_pure_shape_graph() from backward_graph_extractor.py per reviewer feedback (incomplete op whitelist, not maintainable) - Remove tools/backward_graph_test.py (use existing shell script graph_net/test/backward_graph_extractor.sh for batch processing) - Remove tools/backward_kernel_dedup.py (use existing graph_hash.txt based dedup in graph_net/tools/deduplicated.py)

…tent Add `kernel_dedup.py` and wire it as a `dedup` subcommand under `tools.triton_kernel_extractor`. This performs kernel-level dedup by hashing normalized Triton kernel source (triton_poi_fused_xxx.py), which is complementary to the existing graph-level dedup via graph_hash.txt. Signed-off-by: Dayuxiaoshui <792179245@qq.com>

Signed-off-by: Dayuxiaoshui <792179245@qq.com>

- test_compiler: handle list/tuple outputs from backward graphs recursively in _align_output_device and output wrapping logic - extractor: generate graph_hash.txt from model.py content when saving Signed-off-by: Dayuxiaoshui <792179245@qq.com>

Dayuxiaoshui force-pushed the develop branch from 0afddc3 to 67a81d6 Compare May 15, 2026 09:49

Xreki reviewed May 18, 2026

View reviewed changes

Dayuxiaoshui added 4 commits May 18, 2026 11:40

Fix black/ruff formatting in kernel_dedup.py

af8fd20

Signed-off-by: Dayuxiaoshui <792179245@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): add backward graph generation and validation tools#711

feat(tools): add backward graph generation and validation tools#711
Dayuxiaoshui wants to merge 5 commits into
PaddlePaddle:developfrom
Dayuxiaoshui:develop

Dayuxiaoshui commented May 15, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

Xreki left a comment

Uh oh!

Xreki May 18, 2026

Uh oh!

Dayuxiaoshui May 18, 2026

Uh oh!

Xreki May 18, 2026

Uh oh!

Xreki May 18, 2026

Uh oh!

Dayuxiaoshui May 18, 2026

Uh oh!

Xreki May 18, 2026

Uh oh!

Dayuxiaoshui May 18, 2026

Uh oh!

Xreki May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def main():
		parser = argparse.ArgumentParser(description="Backward kernel dedup analysis.")

Conversation

Dayuxiaoshui commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview

Types of Samples Fixed

Issue 1: BatchNorm subgraphs crashing during backward graph generation

Issue 2: Input tensors corrupted by inplace operations

Issue 3: Backward graph list outputs unsupported by test_compiler

Issue 4: Missing graph_hash.txt prevents kernel extraction

Success Rate: Before vs. After

test_compiler Verification

Triton Kernel Dedup Tool

Changed Files

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dayuxiaoshui commented May 15, 2026 •

edited

Loading